Berkeley’s RISC-V Wants to Be Free

Hacker News

	Berkeley’s RISC-V Wants to Be Free (eejournal.com)
	103 points by jobstijl 62 days ago \| comments

faragon 62 days ago

“There are two major products that came from Berkeley: LSD and Unix. We don't believe this to be a coincidence.”

Unix came from Bell Labs in the East Coast. Californian Unix is just a non-elegant derivative, in my opinion (e.g. BSD sockets don't follow the "Unix philosophy", and one explanation could be the LSD, I agree with that).

-----

andolanra 62 days ago

For that matter, LSD was first synthesized by Albert Hofmann while he was working in Basel, Switzerland.

-----

patrickg_zill 62 days ago

Could you maybe expand on your thoughts of BSD sockets vs. what SVR4 came up with?

-----

faragon 62 days ago

Well, BSD sockets don't use file abstraction. Plan 9 recovered "Unix philosophy" in that regard. In my opinion.

-----

JoachimSchipper 62 days ago

Does anyone have actual experience using RISC-V? It does come with a gcc toolchain, so it should just run C - but is it of any practical use yet, relative to the enormous ARM (or MIPS, or even SPARC) ecosystem?

-----

qwerta 62 days ago

We were investigating CPUs for embedded use. I am not sure right now if it was RISC or MIPS. The performance was fraction of ARM or x86. But the consulting company would build SOC exactly to our spec. There were zero licensing fees.

So for embedded use if performance is not important, I would seriously consider it.

-----

arbuge 62 days ago

Reading the article, I see no reason why there should be any major performance issues inherent to RISC-V. Lack of an ecosystem looks like the issue, at least initially. But a $1m licensing fee for a CPU with an established ecosystem is an issue too...

-----

sspiff 62 days ago

The reason the chips will be slower in practice is because they are produced in small batches, and need to be produced by smaller fabs and by extension much older process. MIPS is routinely produced at 95nm still.

-----

asb 62 days ago

We at lowRISC intend to produce in volume at 40nm or 28nm. http://www.lowrisc.org/

-----

arbuge 62 days ago

Looks really interesting. Where are you guys based? Cambridge, UK I'm guessing?

-----

asb 62 days ago

Yes, the core team is based in Cambridge, UK. However we do intend to be a truly open-source project, with an open and distributed development process which would involve arbitrarily geographically distributed third-party contributors.

-----

sspiff 61 days ago

That's truly great to hear!

-----

arbuge 62 days ago

MOSIS offers multi-project wafer (MPW) runs at 28nm from GlobalFoundries.

https://www.mosis.com/products

-----

alain94040 62 days ago

Yes, and the fact that most of the performance comes from the micro-architecture.

CPU design is all about the tradeoff between size, complexity, speed and power consumption.

New RISC architectures tend to start on the low-end of the scale (clean architecture, short pipeline, small die), and if they are successful, move toward the x86/ugly/big/complex/fast spot.

-----

zhemao 62 days ago

If you have a Xilinx FPGA, you can program our newly released Rocket core onto it and play around. https://github.com/ucb-bar/rocket-chip

We don't have a production-quality SoC out, so "practical use" is still a ways away.

-----

ash 61 days ago

In your opinion, is it a good idea to replace ZPU with Rocket on Xilinx FPGA? What are advantages and disadvantages?

http://opensource.zylin.com/zpu.htm

-----

zhemao 61 days ago

The ZPU is designed to use as few FPGA resources as possible. So if you're looking for something to just do a little bit of control logic to tie things together, it probably is the way to go.

Rocket on the other hand is a full-fledged in-order core with hardware mutliplier, single-precision and double-precision floating-point units, and MMU. Therefore, it can do things like boot a full Linux kernel, which the ZPU cannot. Rocket also has the RoCC co-processor interface, which allows you to define your own ISA extensions in Chisel.

-----

bsder 61 days ago

Why redesign a new core? We have plenty to choose from that have semi-existing ecosystems.

First, I'd just go with a MIPS core. This is effectively the original RISC core, after all. Most patents have expired, and the Chinese have an implementation.

If I didn't do that, I'd probably go with a DEC Alpha core. A shrunk, voltage optimized Alpha core was what drove the StrongARM series, and people raved about it. The Alpha architecture up through EV5 is quite clean.

Given that most data suggests that ecosystem and optimization is more important than basic architecture, reinventing the wheel is not a good thing here.

-----

techdragon 61 days ago

Unfortunately the Alpha core is unlikely to ever return, much as its fans would like.

Sparc is open back at the T1 T2 era pre oracle takeover. That's not a bad place to go, but MIPS is not really ideal due to a lot of factors that make the Chinese clones less than appealing.

-----

zzzcpan 62 days ago

Is there a picture of RISC-V architecture (with pipeline, ALUs, FPUs)? Couldn't find anything.

-----

asb 62 days ago

You're asking for a picture of a particular RISC-V implementation and its chosen microarchitecture. The RISC-V ISA has a range of different implementations each hitting different points in the design space and different trade-offs in terms of area/performance. See for instance the 'Sodor' teaching cores https://github.com/ucb-bar/riscv-sodor https://github.com/ucb-bar/riscv-sodor/wiki which demonstrate 1-stage, 2-stage, 3-stage, 5-stage, and micro-coded implementations.

-----

ibrahima 62 days ago

Oh man, that's cool. I must have just missed out on this when taking CS152, we played with simulated SPARC when I took the course IIRC.

-----

zzzcpan 62 days ago

Thanks.

-----

PythonicAlpha 62 days ago

As much I heard, the ARM instruction set is also not so simple as it could. So a new, fresh and clean instruction set would be great and the comparison of the two architectures looks promising.

I scan-read the paper, but did not encounter any number of registers. Does anybody know, how many registers this architecture has/supports ... is there any implementation yet?

Hope, there will be an implementation soon, and maybe a smartphone or tablet computer based on it... (or a new Raspberry?)

Edit: Now I see, should be 32 registers, based on the 32bit opcode-format (5 bits to address any register).

-----

userbinator 62 days ago

As much I heard, the ARM instruction set is also not so simple as it could.

I'd argue that's part of why it's been so successful, especially in competition with the other big ISA - x86. Complex instructions are good because they do a lot in a small size, and a small size is good because it conserves cache usage and memory bandwidth; decoding is rarely the bottleneck now. That's why ARM has Thumb, and their latest cores are similar to x86's uop-decoding and decoupled decoder/execution units.

The whole "it can be done faster in software" idea of RISC was mainly based on some early CISC microarchitectures that weren't implemented as efficiently as they could be, which doesn't imply that they couldn't be. Look at all the instructions added by the various SSE extensions, for example; the performance/speed improvements they provide are real, and the cost of the hardware is basically negligible in comparison.

Here's an interesting comparison where ARM and x86 are quite close in power efficiency, while MIPS is relatively far behind: http://www.extremetech.com/extreme/188396-the-final-isa-show...

-----

hga 62 days ago

Well, the other angles were "what can we fit on a die?" and the relative speeds of single die CPUs and memory.

These have changed throughout history, e.g. when I was reading the Wikipedia CDC 6600 page, it pointed out magnetic core memory of the time could be much faster than germanium transistor CPUs (we're talking about the days when transistors had serial numbers!), and the silicon ones it were built with were fast enough that, with clever architecture (a 60 bit floating point multiply or divide still took many more cycles than a fast memory cycle) and compilers, a theoretical 3 MFLOPs resulted in an achievable 0.5 MFLOPS in FORTRAN. My first job in 1980 was supporting applied mathematicians who ran their FORTRAN on CDC and CRAY hardware at NCAR a couple of thousand miles away. One thing they were always concerned with was to write their array manipulations in a way that the compiler recognized to use the vector hardware.

Back then there was a huge premium on getting a serious CPU on a single die, and I think the speed mismatch with DRAM wasn't hardly as severe.

-----

yaantc 62 days ago

I'm not sure that the Extremetech page allows concluding much on its own regarding MIPS efficiency vs. others as the implementations are pretty different and the only MIPS instance is an oddball: it's OoO and 4-wide as the i7, while being on a very old process (90nm) while the all other chips are on better processes, and the 3 or 4 wide ones on 32 nm. 3 process generations tends to have an impact on efficiency ;)

Also on this topic: if the original MIPS was indeed very simple, recent versions added thumb-like code compression, SIMD extensions, DSP extensions... As everyone else does really. And you can get similar CPU benchmarks level as other architectures. Then the very same instruction architecture can have pretty wide power efficiency differences depending on implementations. Just look at Atom vs. Core, A8 vs. 15... Even the same model shows big differences based on implementation choices (optimize for speed vs. area/cost for example).

As someone working on SoC, I believe the key difference in MIPS vs. ARM vs. Intel is not so much the instruction set, but the work effort spent on optimizing the implementations close to the processes. It's pretty obvious for Intel, but ARM spends a lot of time with the big foundries (TSMC, GlobalFoundries, Samsung) to make sure their implementation are well tuned. And they pass on the know-how to their customers through what they call "POPs" (Process Optimization Packages). This is important for ARM, but also for the foundries as ARM is the de facto standard in mobile. A lot of work go in this, and it shows at the high end. MIPS/Imagination recently propose something similar for MIP, but they're far from having the same amount of resources. That's the main difference between MIPS and ARM at the high-end IMHO.

-----

WallWextra 62 days ago

It would have been nice if they had a MIPS core actually designed by MIPS.

-----

sklogic 62 days ago

> Hope, there will be an implementation soon

It's quite possible: http://www.lowrisc.org/

-----

hga 62 days ago

Per that page, "multiple high performance [RISC-V] chips have been produced, at 45 and 28nm", which is very encouraging.

More from the page, enough to get me interested:

On performance: "As a rough guide we would expect ~500-1GHz at 40nm and ~1.0-1.5GHz at 28nm."

Is volume fabrication feasible?

Yes. There are a number of routes open to us. Early production runs are likely to be done in batches of ~25 wafers. This would yield around 100-200K good chips per batch. We expect to produce packaged chips for less than $10 each.

Looking at this slide show: https://speakerdeck.com/asb/lowrisc-a-first-look I see they're doing some very interesting things. Besides a lot of usual stuff, 2 cores each with I+D L1 and a shared L2 cache, they've added tags!

2 bits of tags, and they're thinking general purposes (e.g. GC) as well as security. That's big, something we haven't seen in hardware since Lisp Machines, to my knowledge.

Also I/O "minions", non-coherent RISC-V cores, like the CDC 6600 etc.

I'm now quite interested in the project.

-----

rogerbinns 62 days ago

> 2 bits of tags, and they're thinking general purposes (e.g. GC) as well as security. That's big, something we haven't seen in hardware since Lisp Machines, to my knowledge.

The IBM AS/400 or whatever it is called this week uses tags too for security. (A bit is flipped if a pointer is written too thereby preventing changing pointers.) The PowerPC chip has optional support for tags because of this.

Source: http://www.amazon.com/Inside-AS-400-Second-Edition/dp/188241...

-----

asb 62 days ago

Thanks for your interest, we're hoping to share more very soon.

-----

hga 62 days ago

You're very welcome.

Read and write barrier instructions could be very nice for GC, but those can be done at the page level, I think (I would look at the Azul's Pauseless collector for one state of the art custom hardware approach, and their C4 for doing without on x86_64).

But I'm more interested in tags for dynamically typed languages like Lisp. You don't need many bits, Symbolics used only 4 "hard" ones in the 36 bit 3600 line, which allowed for immediate 32 bit integer and floating numbers, then another 4, leaving 28 bits for word based addressing.

In modern byte addressed CPUs, you can use the least significant bits that are 0 in a pointer for tagging, so the 2 "hard" bits you propose are enough to work with. Basically, as long as we don't have to box floats!

Although it would be nice if we could set things up so that e.g. floating point operations could proceed at full speed, generating a fault if the tag bits on the operands aren't correct ... although superscalar CPUs can do both in parallel without "hard" tags. Are your two main cores going to be superscalar?

And I guess to finish my wishlist, a potential for a lot of ECC DRAM would be nice, at least 16 GiB.

ADDED: flip side is don't let the Second System Syndrome sink this effort (the Raspberry Pi being the first; just going to your own new architecture SoC is a really big step!).

-----

asb 62 days ago

The 2-4 main cores will be derived from the UC Berkeley Rocket core generator https://github.com/ucb-bar/rocket. Currently this is single issue, in-order though it is designed to be a dual-issue in-order core (i.e. superscalar).

-----

hga 62 days ago

Errr, correct me if I'm wrong, but as I read the v2 RISC-V spec (starting with the absence of a flags register!), plus some quality time on Google, RISC-V has no provision for integer computation errors besides divide by zero (which can be checked for). From the spec:

"We did not include special instruction set support for overflow checks on integer arithmetic operations. Most popular programming languages do not support checks for integer overflow, partly because most architectures impose a significant runtime penalty to check for overflow on integer arithmetic and partly because modulo arithmetic is sometimes the desired behavior."

!!!

Or, RISC-V is a Worse is Better (https://en.wikipedia.org/wiki/Worse_is_better), New Jersey style processor, following the same exact model of the primacy of implementation simplicity.

Well, especially since we aren't going to get people to use "safe" languages prior to e.g. software errors killing 4-6 figures of people, I guess it's a really good thing you're adding tag bits that can be used to improve security in unsafe languages.

On the other hand, as a CPU for The Right Way operating systems and languages like Lisp, RISC-V would seem to be ... subpar, especially since who knows what other corners have or will be cut. (Yeah, I know you can check, and for Lisps, it would be fairly cheap to have fixnums be 32 to 29 bits (subtracting LSB bits for more tagging) and then promote them to bignums, but....)

While I think you're doing a very good thing for the world as it is, my personal interest has dropped ... but that's OK, I've got a couple of years to muse about this as you develop your design and get it into production.

-----

asb 62 days ago

I am personally very sympathetic to concerns about the efficiency of overflow checking. There has been some discussion of this on the riscv-hw mailing list, e.g. https://lists.riscv.org/lists/arc/hw-dev/2014-09/msg00007.ht.... The current position of the Berkeley team is that overflow checking just adds a single rarely taken branch, which can easily be predicted. If new data is produced (such as evaluating a wider range of instruction traces, e.g. those from programs not in C), there may well be an argument to be made for new instruction set additions. There is a plan for some sort of RISC-V consortium with representation from all major implementors, though this is will take a while to develop. I imagine we'll be discussing this at the RISC-V Workshop 14th-15th Jan in CA http://riscv.org/workshop/

-----

renox 60 days ago

> the current position of the Berkeley team is that overflow checking just adds a single rarely taken branch, which can easily be predicted

I find this argument VERY weak: you need many of these 'very rarely taken branch' overflow checking instructions and they pollute the instruction cache..

-----

leoc 58 days ago

And that's also assuming that all of the necessary checks will in fact be added!

-----

renox 53 days ago

I don't think that this part is really an issue: if the compiler by default adds the checks and only remove the checks very conservatively..

-----

Someone 62 days ago

31. Register zero is hardwired to be zero.

[by the way, the PDF at http://riscv.org/download.html#tab_isaspec is clear reading]

-----

PythonicAlpha 62 days ago

Thanks. I think, this spec is to long for me now to read.

I am wondering, what the advantage of a zero register is? I thought about supporting direct operations, but adding zero does not make to much sense, same with minus, multiply, division .... OR/AND/XOR does also make not much sense. Storing data into this register also does not make to much sense ... or as garbage tray??? ;) (the hardware implemented garbage tray was once the April fool joke in an IT magazine).

-----

lambda 62 days ago

If you take a look at the master's thesis on designing the compressed opcode format[1] (expressing instructions in a 16 bit encoding, similar to ARM Thumb, to increase efficiency of instruction cache utilization), in which they counted what registers were used most often in order to determine which to allow access to in the compressed format, you'll see that the 0 register is the 5th most common register, making up approximately 9% of all registers referenced in instructions. Dynamically, in terms of instructions actually executed as opposed to merely present, it's a little less common at closer to 4%, but still common enough to be worth choosing as one of the 8 registers accessible in the compressed format (actually, they only allow it for some operands of particular instructions where it is particularly useful in the compressed format, allowing other operands to have access to one more real register).

It's useful for several things. One is as a no-op instruction; as you point out, AND, OR, XOR, etc. with zero are no-ops, and no-ops are sometimes useful for achieving certain alignment in code, or leaving space for instructions to be patched out with other code (which is common for debugging, tracing, and hot-patching of code).

It can also be used for copying from one register to another, without another instruction. "ADD rd, rs1, 0" is a way to write "MOVE rd, rs", so you don't need to waste an extra instruction on that.

0 is also one of the most common values that you want to compare against (many loops can be turned into decrementing a value until it becomes zero, or you're just walking along a data structure or string until you encounter a null value), and so their branch instructions work by comparing two registers and branching to a particular offset. Having a dedicated 0 register means you can always do a compare against 0 and branch in a single instruction, without having to have separate instructions for comparisons against immediates versus comparison of two registers.

[1]: http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-63...

-----

PythonicAlpha 62 days ago

I did not think about comparisons ... that is right.

The other examples are also valid, but rather special to RISC architectures ... but of course it does the trick.

-----

dezgeg 62 days ago

With a dedicated zero register it's possible to implement some useful instructions as assembler pseudoinstructions and not on the CPU itself, saving chip area.

For example, the original MIPS instruction set has these pseudoinstructions:

- li $d, imm (load immediate) is replaced with ori $d, $0, imm (or immediate with the zero register)

- not $d, $s (bitwise complement) is replaced with nor $d, $0, $s (bitwise NOR with the zero register)

... and some others as well.

-----

fidotron 62 days ago

ARM have a real habit of churning out ISAs on an as needed basis, such as the Thumb stuff.

The uncomfortable reality here though is that this is just another project from the ultimately failed MIPS/SPARC view of the world. ARM has little to zero academic credibility as it is really just a jumped up 6502. This lack of any sort of theoretical purity is really their big strength.

-----

jacquesm 62 days ago

> ARM has little to zero academic credibility as it is really just a jumped up 6502.

I think that statement is taking a few more shortcuts than are warranted.

-----

higherpurpose 62 days ago

This could be used for stuff like Arduino at least.

-----